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Abstract 

In the light of a recently derived evolution equation for genetic algorithms 
we consider the schema theorem and the building block hypothesis. We derive a 
schema theorem based on the concept of effective fitness showing that schemata 
of higher than average effective fitness receive an exponentially increasing number 
of trials over time. The equation makes manifest the content of the building block 
hypothesis showing how fit schemata are constructed from fit sub-schemata. How- 
ever, we show that generically there is no preference for short, low-order schemata. 
In the case where schema reconstruction is favored over schema destruction large 
schemata tend to be favored. As a corollary of the evolution equation we prove 
Geiringer's theorem. 



Key Words: Schema Theorem, Building Block Hypothesis, Evolution equation, Ef- 
fective fitness. 



1 Introduction 



One of the most commonly asked questions about genetic algorithms (GAs) is: under 
what circumstances do GAs work well? Obviously an answer to this question would help 
immeasurably in knowing to which problems one can apply a GA and expect a high level 
of performance. However, to answer this question one has to answer a more fundamental 
question: how do GAs work? For example, in a typical optimization problem how does 
the GA arrive at a good solution? It is clear that in very complex problems this is not 
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achieved via a random search in the state space. The search is structured. However, 
the question remains as to what is the nature of this structure. To put this question 
another way, if we think of individual string bits as "degrees of freedom" , the GA does not 
exhaustively search through the different combinations of individual bits, i.e. a search in 
the entire state space. Rather it searches through a restricted space spanned by different 
combinations of "effective degrees of freedom" (EDOF), which are combinations of the 
more fundamental "microscopic" bit degrees of freedom. 

What exact form these EDOF take depends of course on the particular landscape 
under consideration. Hence, one might despair as to whether it was possible to say 
anything that applied to more than a specific case. However, it is not meaningless to try 
to understand if they exhibit generic properties, independent of the landscape, or at least 
properties that are common to a large class of possible landscapes. The building block 
hypothesis and the schema theorem ||10|| , || attempt to identify such generic features 
and as such have played an important role in GA theory, if one accepts that one of 
the principal goals of a theory is to provide a framework within which one can gain a 
qualitative understanding of the behavior of a system. The basic gist of the building 
block hypothesis is that short, low- order, highly fit schemata play a preeminent role 
in the evolution of a GA; i.e. that the relevant EDOF for a GA are short, low-order, 
highly fit schemata. The schema theorem tries to lend a more quantitative aspect to the 
hypothesis by showing that such schemata are indeed favored. This fact is deduced via 
an analysis of the destructive effects of crossover. However, as is well known, the schema 
theorem is an inequality and is such because it does not say anything precise about 
schema reconstruction. To understand better the interplay between schema destruction, 
schema reconstruction and schema length one requires an evolution equation that is 
exact, and where schemata are the fundamental objects considered. 

Various exact evolution equations have been derived previously: |7| wrote down 
exact equations for two-bit problems. Later these equations were extended to three and 
four-bit problems citewhitley. These equations allowed for an explicit analysis of string 
gains and losses. |25] also presented an algorithm for generating evolution equations 
for larger problems that was equivalent to an earlier equation of Bridges and Goldberg 
0. Although exact these equations are extremely unwieldy and it is difficult to infer 
general conclusions from their analysis. Another related approach is that of Vose and 



collaborators |^TJ , , [TT[ that treats GA evolution as a Markov chain. One of the chief 
drawbacks of all the above, with respect to an analysis of the schema theorem and the 
block hypothesis, is that the former are evolution equations for strings whereas the latter 
refer to schemata. Evidently an evolution equation that is amenable to interpretation 
and analysis that treats schemata as fundamental objects would be preferable. Such an 
equation has been derived recently [] [[17]] , [|l6j for the case of proportional selection and 



1-point crossover. The chief aim of this paper is to analyze the schema theorem and the 
building block hypothesis in the light of this equation. 

Crucially, we will be able to quantify the effect of schema reconstruction relative to 
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that of schema destruction. Traditionally, crossover as a source of schema disruption 
has been emphasized |[23|| , |T3| . This idea is at the heart of the schema theorem and 
the building block hypothesis. There has been some work towards a more positive point 
of view of crossover vis a vis reconstruction [|l9j, |y| but mainly in the light of the 
exploratory nature of crossover. Here, we will see exactly under what conditions schema 
reconstruction dominates destruction. 

In analyzing the consequences of the evolution equation we will especially emphasize 
two ideas: effective fitness and EDOF. With respect to the former we will show that: if 
one thinks intuitively of fitness as representing the ability of a schema to propagate then 
effective fitness is a more relevant concept than the conventional idea of fitness. We will 
formulate a schema theorem in terms of the effective fitness showing that schemata with 
high effective fitness receive an exponentially increasing number of trials as a function 
of time. The second key idea, already mentioned, is that of EDOF. Generically one can 
think of a schema as an EDOF. However, schemata offer for every string a decomposition 
into 2 N different elements of a space with 3^ members. Not every decomposition will be 
useful. In fact, typically, only a small subset. So what do we mean by useful? EDOF, 
if they are to have any utility whatsoever, should not be very strongly coupled. This is 
a notion that is intimately associated with how epistasis is distributed in the problem. 
This type of thinking is common to many fields and generally is associated with the 
idea of finding a basis for a highly non-linear problem wherein it decomposes into a set 
of fairly independent sub-problems. An important feature of complex systems is that 
the EDOF are "scale" dependent. This scale dependence very often takes the form of a 
time dependence wherein the EDOF are different at different stages of evolution. This 
complicates life greatly in that if we find a useful decomposition of a problem at time 
t we have no guarantee that it will remain a useful decomposition indefinitely into the 
future. 



2 Coarse Graining and Schemata 

As is well known to any scientist or engineer a good model of a system is one that captures 
the relevant features and ignores irrelevant details. The deemphasis of irrelevant details 
we can think of as a "coarse graining" . Of course, a great difficulty is that often what 
is relevant versus irrelevant depends on what one wants to say about the system, i.e. 
what level of description one requires. It also, more often than not, depends on time. 
One of the most obvious examples of this is evolution: the primitive constituents of 
life, amino acids, DNA, RNA etc, which represent the "microscopic" degrees of freedom, 
over time combined to form progressively more and more complicated EDOF such as 
cells, sponges, people etc. This evolution in time is intimately linked to an evolution in 
"scale" and a corresponding evolution in complexity. 

In a GA specifying all the bits of a string gives us the most fine grained, microscopic 
description possible. For strings of size N and a population of size n there are Nn degrees 
of freedom and, for a binary alphabet, 0(n2 N ) possible population states. Consider the 
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different classes of fitness maps that may be defined: first, fa ■ G — > R + , where 
G denotes the space of genotypes (string states) and f G is the fitness function that 
assigns a number to a given genotype; second, f Q : Q — > R + , where Q is the space 
of phenotypes. These mappings may be explicitly time dependent. In fact, this will 
normally be the case when the "environment" is time dependent. They may also be 
injective or not, although the map f Q will usually be injective. If f G is many-to-one then 
there exist "synonymous" genotypes, i.e. the mapping is degenerate. If we assume there 
exists a map <fi : G — > Q between genotype and phenotype then we have f G = f Q o<f), i.e. 
the composite map induces a fitness function on G. A schema, £, consists of N 2 < N 
defined bits. The defining length of the schema, /, is the distance between its two 
extremal defining bits. The space of all schemata, S, may be partitioned according to 
schema order; i.e. S = J2n 2 Sn 2 , where Sn 2 is the space of schemata of order N 2 . The 
mapping g : G — > S between strings and schemata is many-to-one. The degree of 
degeneracy of the map, g^ 2 : G — ► Sjv 2 , is 2 N ~ N2 . Except for the trivial case of a 
0-schema, maximal degeneracy occurs when N 2 — 1 where half of S is mapped onto one 
schema. The fitness of a schema is the map f s : S — >■ R + , which is related to f G via 
the composite map fs°9 = fc- Explicitly, 

E f(Ci,t)n(Ci,t) 

where f(Ci,t) is the fitness of string at time t, n(Ci,t) is the expected number of 
strings of type Cj at time t and the sums are over all strings in the population that 
contain £. 

As mentioned, the total number of schemata for a binary alphabet is 3^. Why go to 
an even bigger space than the state space itself? One answer to this question is related 
to the idea of coarse graining. In defining a schema we average over all strings that 
contain the given schema. In such a sum we are summing over all possible values for the 
string bits Cj — £ present in the population. A schema thus represents a coarse grained 
degree of freedom because we are forfeiting explicit information about the out of schema 
string bits. Clearly the lower the order of the schema the higher the degree of coarse 
graining, the maximal coarse graining being associated with the maximally degenerate 
schema where N 2 — 1. 

A schema of order A^ 2 has only N 2 degrees of freedom and 2 N2 possible states. Given 
that one of the fundamental characteristics of complex systems is the existence of a large 
number of degrees of freedom and an exponentially large state space any methodology 
that purports to reduce the number of EDOF will prove very useful. To see this in the 
context of an explicit example let us say that we wish to calculate the average fitness 
in a GA evolving according to proportional selection with strings of size N, where is 
a multiple of 2. The evolution equation for the expected number of strings of type Cj, 
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n(d,t), is 



n(Ci,t + l) = ^^-n(Ci,t). (2) 

The average population fitness, f(t), for the case of a non-time dependent landscape 
obeys the equation 

f(t + l) = ^y^P(ci,t) (3) 

where P(ci,t) = n(ci,t)/n, n being the population size which we regard as being con- 
stant. As proportional selection is a stochastic process, for small population sizes one 
will typically see large fluctuations, i.e. in any given experiment one may well see large 
deviations between the results of (0) and @ and the corresponding experimental quan- 
tities. However, taking averages over repeated experiments the results converge to those 
of the above equations. In fact, in the infinite population limit P(Ci,t) will converge 
to the probability of finding string Cj at time t. The string fitness maps every string 
state to R + . If the population is large then many strings will be represented and hence 
many terms in f(t) will be non-zero. Thus, to calculate the evolution of f(t) one needs 
to solve ~ 2^ coupled equations. Let us instead take the following approach: we will 
average over odd string positions in the population leaving strings, c^, (or rather now 
schemata) of N/2 definite bits that satisfy 

<c' l ,t + l)= f M^n{c' t ,t) (4) 

where now the fitness of c\ depends on time even if /(c^) didn't. To calculate f(t) 
one now only needs to solve 2 N / 2 coupled equations. One can repeat this process, each 
step of coarse graining reducing the number of EDOF by one half, until we reach the 
situation where a 1-schema has been reached. This cannot be further coarse grained 
of course, except by passing to the trivial situation wherein all string bits are summed 
over, i.e. a 0-schema. At this level the evolution equation for the effective string of size 
one (1-schema a) is 

n(a,t + l) = £i^n{a,t) (5) 

Now, a can only take two values, 1 and say, hence f(t) = [/(l, t)n(l, t)+f(0, t)n(0, t)]/n. 
Thus, the problem of finding the average fitness has been reduced to that of solving a 
problem with one degree of freedom and two possible states! 

So what's the catch? The principal, and more fundamental, problem is that the 
genetic operators, principally reproduction and crossover, are defined at the microscopic 
level. In other words, as can be seen in ([[]), to assign a fitness to a schema one has to sum 
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over the different strings in the population that contain it. Thus, to calculate quantities 
associated with the coarse grained degrees of freedom one must consider the microscopic 
degrees of freedom. One might be tempted therefore to think that even though there 
is an apparent reduction in the number of EDOF the net gain is canceled out by the 
fact that one has to return to the microscopic degrees of freedom in order to calculate 
their evolution under the genetic operators. If one wished to calculate the dynamics 
exactly then the above would be true. However, returning to the idea of emphasizing 
the relevant degrees of freedom it may well be that in the averaging process certain 
ones are more important than others, therefore allowing one to neglect, or treat as a 
perturbation, the effect of the irrelevant ones. In particular, near a fixed point of the 
dynamics one might well expect to see a simplification. 

A second problem is that if we wish to ask a question about a particular string and 
we only have access to schemata of order N 2 < N then the question will be impossible 
to answer. In other words, if we are going to accept a coarse grained description then 
we can only ask questions about coarse grained variables. This in no way will affect the 
calculation of population variables such as average population fitness, standard deviation 
about the average fitness etc. Neither should it affect the ability of the GA to find an 
optimum as a fixed point of the dynamics as this can be represented in terms of optimal 
schemata. 

In the above we discussed a particular coarse graining which led to a certain, definite 
set of schemata of order N/2, N/4, 1 associated with averaging over the odd bits of 
each successive coarse grained string. Generally there are very many different coarse 
grainings possible, 3^ — 1 for a given string. Which are useful and which aren't? This 
depends on the fitness landscape under consideration. What one wishes to do is to 
choose a coarse graining that gives rise to EDOF that are relatively weakly coupled. 
Finding such a coarse graining may well of course be very difficult. The coarse graining 
by factors of 2 above is a proposal for an algorithm to calculate GA evolution. Whether 
this particular coarse graining would be useful, as mentioned, depends on the fitness 
landscape. Although the method might seem somewhat artificial it is important to 
emphasize that such methods, based on the idea of the renormalization group (see for 
example Goldenfeld (1989) for a review) have proved to be extremely effective in many 
areas of physics and applied mathematics and have yielded very good results on canonical 
optimization problems such as the Traveling Salesman problem. 

Later we will emphasize that 1-schemata are very useful coarse grained variables, as 
being of size 1 they are immune to the effects of crossover. In terms of 1-schemata the 
average fitness in the population is 

N 

f(t) = Y,f(®,t)P(a,t) (6) 

i=a 

where the sum is over the N possible 1-schemata, f(a,t) is the fitness of the 1-schema 
a at time t and P(a,t) = n(a,t)/n is the expected proportion of strings present in the 
population that contain the 1-schemata a. 
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3 Schema Equation 



In this section we will review the derivation of the schema evolution equation of [17 



IE . Given that the microscopic degrees of freedom are strings however, we will first 
derive an equation for strings evolving under the effects of the three genetic operators: 
proportional selection, crossover and mutation. We will throughout only consider simple 
one-point crossover. The analysis can be repeated, with analogous results, for the case 
of n-point crossover. 

We will consider the change in the expected number, n(£,t), of strings that contain 
a particular schema £, of order N 2 and length / > jV 2 , as a function of time (generation). 
If mutation is carried out after crossover one finds that the expected relative proportion 
of Ci in the population, P(cj,t) = n(ci,t)/n, satisfies 

P(Ci, t+l)= P(c^)P c (q, t) + J2 Vic^P^Cj, t) (7) 

where the effective mutation coefficients are: V{c i -,c^) = llfe=i(l ~Pm(k)), which is the 
probability that string i remains unmutated, and V^cj-.Ci), the probability that string j 
is mutated into string i given by 

nc^)= n pm(k) n m 

where p m {k) is the mutation probability of bit k. For simplicity we assume it to be 
constant, though the equations are essentially unchanged if we also include a depen- 
dence on time. {Cj-d} is the set of bits that differ between Cj and C{ and {Cj-Ci} c , the 
complement of this set, is the set of bits that are the same. In the limit where p m is 
uniform, V(c^ Ci ) = (1 - p m ) N and Pfe-.cJ = pf <*•>'> (1 - Pm)"-"*™, where d H (i,j) 
is the Hamming distance between the strings Q and Cj. The quantity P c (Ci,t) is the 
expected proportion of strings of type q in the population after selection and crossover. 
Explicitly 

P c ( Ci ,t) = P'( Cl ,t) - ^ Y. c{ Sc.{k)P\c h t)P'{c h t) (9) 



Pc 



JV-1 



V _T E E Y,^c l {k)P\c^)P'{c h t) 

where p c is the crossover probability, k is the crossover point, and the coefficients C^ c (k) 

and Cc/ctik), represent the probabilities that, given that Ci was one of the parents, it is 
destroyed by the crossover process, and the probability that given that neither parent 
was Ci it is created by the crossover process. Explicitly 

Cg) (k) = 9(d*(i,j))9(d%(i,j)) (10) 
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and 

where d^(i,j) is the Hamming distance between the right halves of the strings Cj and Cj, 
"right" being defined relative to the crossover point, with the other quantities in ( |TDj ) and 
flllD being similarly defined. 6(x) = 1 for x > and is for x = 0, whilst 5(x) = Vi / 
and 5(0) = 1. Finally, P'(ci,t) = (f(ci,t)/f(t))P(ci,t) gives the expected proportion of 
strings Cj after the selection step. Note that Cc}c- (k) and C§ Cl (k) are properties of the 
crossover process itself and therefore population independent. The equation (|7|) yields 
an exact expression for the expectation values, n(ci,t), and in the limit n — > oo yields 
the correct probability distribution governing the GA evolution. 

The evolution equation we have derived takes into account exactly the effects of 
destruction and reconstruction of strings and, at least at the formal level, has the same 
content as other exact formulations of GA dynamics It should also be formally 



equivalent to the equation of Bridges and Goldberg ||. Before passing to the case of 
schemata it is interesting to put the equation into a simpler form. To see this, consider 
first the destruction term. The matrix ([II]) restricts the sum to those Cj that differ from 
Ci in at least one bit both to the left and to the right of the crossover point. One can 
convert the sum over Cj into an unrestricted sum by subtracting off those Cj that have 
di(i,j) = and/or d^(i,j) = 0. Similarly one may write the reconstruction term as 

E E P\cj,t)P\c h t) (12) 

where cf is the part of c» to the left of the crossover point and correspondingly for cf. 
However, by definition 

Yjc -)C l f( c h t)n(Cj, t) 

mA= e;J^,V (l3) 

where J2cjDC L n ( c jit) is the total number of strings in the population that contain c\ . 
The final form of the string equation without mutation thus becomes 

P{c h t+l) = P'(c t , t) - £ (P'( Ci , t) - P'{cl t)P'(cf, t)) (14) 

with 

P'(cf,t)= £ P'(Cj,t) (15) 

and similarly for P'{cf,t). It is important to note here that in this form the evolution 
equation shows that crossover explicitly introduces the idea of a schema and the conse- 
quent notion of a coarse graining, cf and cf are schemata of order and length k and 
N — k respectively. 
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The analogous equation for schema evolution can be found by summing equation 
([?]) over all strings that contain the schema of interest £. The result is 

p(t t + i) = v(t-*)P c (t, + E Hu-<)PMu t) (is) 

u 

where 

p c (e, t) = p'(e, t) - ^ e (p'(e, o - p'(^, t)p\u, t)) (i7) 

and the sum in ([16]) is over all schemata ^ that differ by at least one bit from £ in one 
of the N2 defining bits of £. All other quantities are the schema analogs of quantities 
defined in (]7|). The effective mutation coefficients P(?->{) and P(&->£) are 

V{t-*) = (l-Pm) N2 and V(^)=p d ^^(l- Pm ) N2 - dH{ai) (18) 

where d H (l;,£i) is the Hamming distance between the schemata £ and jjfj. 

A very interesting feature of the evolution equations we have presented is their 
form invariance under a coarse graining. Starting with the string equation any coarse 
graining to schemata of order N 2 < N yields an equation identical in form to that of its 
predecessor. 

4 Effective Fitness 

Having derived the schema evolution equation, before turning to an analysis of its many 
features, we wish to digress on the notion of fitness. The main intuitive idea behind 
fitness is that fitter parents have more offspring. In equation ([16]), neglecting for the 
moment mutation and crossover, taking the limit of a continuous time evolution one 
finds 

P(£,t)=P(£,0)eJoV*' (19) 

where s 5 = — 1 is the selective advantage of the schema £. If s e > the expected 
number of £ grows, whilst if s 5 < it decreases. However, consider the following two 
simple cases. First, consider the effect of mutation without crossover in the context of 
a model that consists of 2-schemata, 11, 01, 10, 00, where each schema can mutate to 
the two adjacent ones when the states 11, 10, 00, 01 are placed clockwise on a circle. 
For example, 11 can mutate to 10 or 01 but not to 00. This is evidently the limit where 
two-bit mutations are completely negligible compared to one-bit mutations. We assume 
a simple degenerate fitness landscape: /(ll) = /(01) = /(10) = 2, /(00) = 1. In a 
random population, P(H) = ... = P(00) = 4. If there is uniform probability p m for 
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each schema to mutate to an adjacent one then the evolution equation that describes 
this system is 



P{i, t + 1) = (1 - 2p m )P'(i, t) + p m {P'{i -l,t) + P'{i + l,t)) (20) 

For p m = the steady state population is P(ll) = P(01) = P(10) = 1/3, P(00) = 0. 
Thus we see the synonym symmetry of the landscape associated with the degeneracy of 
the states 11, 10 and 01 is unbroken. However, for p m > 0, the schemata distribution at 
t — 1, starting from a random distribution at t — 0, is P(ll) = 2/7, P(01) = P(10) = 
(2 — p m )/7, P(00) = (1 + 2p m )/7. Thus, we see that there is an induced breaking of 
the landscape synonym symmetry due to the effects of mutation. In other words the 
population is induced to flow along what in the fitness landscape is a flat direction. 

As a second example consider the 2-schemata problem now with crossover but 
neglecting mutation, and with a fitness landscape where /(01) = /(10) = and 
/(ll) = /(00) = 1. The steady state solution of the schema evolution equation is 

P(ll) = P(00) = \ (l - fj^y) P(01) = P^O) = fl^y (21) 

For I = N and p c = 1 we see that half the steady state population is composed of strings 
that have zero fitness! 

Although the above examples are artificial they serve to make the point that the 
genetic operators can radically change the "effective" landscape in which the population 
evolves. The actual "bare" landscape associated purely with selection in the above 
offers very little intuition as to the true population evolution. Real populations can flow 
rapidly along flat directions and strings may be present even if they have zero fitness. To 
take this into account we propose using an "effective" fitness function |17|], [|1(| defined 
via 

P(t,t+l) = £j&P{t,t) (22) 
comparing with equation ( |TED one finds 

Pc „, fP'(U)-P'^L,t)P'(Ut)\ 



1 k=i \ 



^ v{t ^ m) | ( p, «" f) -;«;- f)p, «- f) ) (23) 

In the limit p m — > 0, p c — > we see that f ea (^, t) — > f(£, t). The above also leads to the 
idea of an effective selection coefficient, s cff = f c s{^,t)/J{t) — 1, that measures directly 
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selective pressure. If we think of s cS as being approximately constant in the vicinity of 
time t , then s cH (t ) gives us the exponential rate of increase or decrease of growth of 
the schema £ at time to- In the limit of a continuous time evolution the solution of ( p2| ) 
is 

P(S,t)=P&0)efi B *"* (24) 
In the case of the toy examples above: for mutations without crossover 

U (i, t)=fi+ JT^Ui-iP^ - M) + fi+iP(i + M) - 2/»P(i, t)) (25) 

At i = 0, / eff (ll, 0) = 2, / eff (01, 0) = U (10, 0) = 2 - p m and / efi (00, 0) = 1 + 2p m . Thus 
we see that the effective fitness function provides a selective pressure by selecting among 
the degenerate schemata those that have a higher probability to produce fit descendents. 

Of course, the definition of effective fitness is not unique. Another natural definition 
follows from the split into those terms of the evolution equation that are linear in P(£, t) 
and those "source" terms that are independent of it. For instance, in the case of selection 
and crossover we have 

Pfa t +l) = £$£p(t,t)+j{t) (26) 

where = (1 - Pc ^)lm and j(t) = £jr\ P'(£ L , t)P'{£ R , t). The corre- 

sponding effective selection coefficient is s' eB = ((1 — Pc^zl)^^ — 1)- In the limit of a 
continuous time evolution (|26D may be formally integrated to yield 



P(£,t) = eSo<^ )dt 'P(i,0) + efo<^ dt ' f j(t')e-ti<* {t " )dt "M (27) 

Jo 

5 Schema Theorem and Building Blocks 

We now turn to a discussion of the schema theorem and the building block hypothesis. 
The standard "schema theorem" [T(| ||, or fundamental theorem of GAs, states that 
for a schema, £, of length I evolving according to proportional selection and 1-point 
crossover 

p(e, t + 1) > p'(e, t) (i - vc - N *p) ' ( 28 ) 

and has the interpretation that schemata of higher than average fitness will be allocated 
exponentially more trials over time. The conventional schema theorem only provides us 
with a lower bound for the expected number of schemata due to the fact that it does 
not explicitly account for schema reconstruction. Equation (16), however, exactly takes 



into account the effect of schema reconstruction due to both mutation and crossover. 
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Together with the definition of effective fitness in equation ( p3| ) of the previous section 
it allows one to state a new schema theorem: 



Schema Theorem 



P^t + l) = ^^-P(U) (29) 



The interpretation of this equation is clear and analogous to the old schema theorem: 
schemata of higher than average effective fitness will be allocated an "exponentially" 
increasing number of trials over time. We put the word exponentially in quotes as 
the real exponent, / s cS dt' , is not, except for very simple cases such as a flat fitness 
landscape, of the form at, where a is a constant. The illustrative examples of the last 
section show that there is potentially a strong difference between the standard selection 
based fitness and effective fitness as the latter takes into account the effect of all genetic 
operators. The fact that strings with zero selective fitness can receive an exponentially 
increasing number of trials shows quite clearly that effective fitness is a more relevant 
concept. In this sense our schema theorem does not just state the obvious — that fit 
schemata that are preserved by the crossover operator will prosper. Once again this 
emphasizes the role of the destructive effect of crossover. The novel element here is 
seeing exactly how important is schema reconstruction. In fact generically it is the 
dominant contribution. 

The schema evolution equation we have derived possesses many interesting features 
one of the most interesting being the way that it relates evolution in time to different 
levels of coarse graining. To see this we first return to the string evolution equation (0). 
Up to this point we have presented our results in almost the most general way possible 
- for any type of landscape and taking into account both crossover and mutation. 
Throughout the rest of the paper we will concentrate more on the effect of crossover 
and thus neglect mutation. The reason for this is that we will mainly be concerned 
with the importance of schema length vis a vis the building block hypothesis. Mutation 
being a strictly local operator will not play a major role in this discussion. Note that 
this equation is written entirely in terms of the fundamental degrees of freedom — the 
strings. In passing to the form (FJ) we have performed a coarse graining by summing 



over all strings that contain cf irrespective of what lies to the right of the crossover 
point, and similarly for strings containing cf . The implication is that the very nature of 
crossover imposes on us the idea of coarse graining, and more specifically the idea of a 
schema, given that c\ and cf define schemata of order and size k and N — k respectively. 



In order to solve the equation ([14]) we need to know P(cf,t) and P(cf,t). However, 
these in turn obey evolution equations of the form 

fc— i 

P(cf,t + 1) = P'(cf,t) - J2(P'(cf,t) -P'(c^,t)P'(cf R ,t)) (30) 

iV 1 m=l 

where c\ L and cf R are the left and right parts of c\ , left and right being defined relative 
to the crossover point m, where m < k. Now, cf L and cf R as schemata are more coarse 
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grained than c\ , i.e. they are of lower order. Clearly this pattern of behavior continues, 
i.e. in order to calculate P(Ci,t + 1) one requires P(cf,t) and P{cf,t) which in their 
turn require P[c\ L , t-1), P{cf R , t-1), P{c RL , t - 1) and P{c RR , t - 1) etc. For each 
step back in time we pass to more coarse grained degrees of freedom. Ci thought of as 
a schema is of higher order than cf or c R , which in their turn are of higher order than 
cf L , cf R , c RL and c RR . So where does this process stop? The maximally coarse grained 
EDOF are 1-schemata. It is not possible to cut a 1-schemata and hence crossover is 
explicitly neutral, i.e. 1-schemata obey the equation 

P(i,t + 1) =P\i,t) (31) 

As a simple example consider a 4-bit string ijkl. The hierarchical structure of one 
possible ancestral tree can be written as 

t+1 ijkl 

t ijk,l ij,kl i,jkl 

t-1 ij,k i,jk i,j k,l jk,l j,kl 

t-2 i,j j,k j,k k,l 

This tree shows only the effect of the reconstruction term in the schema equation over 
the space of 3 generations. Of course there are many other processes that contribute 
to the appearance of ijkl at time t + 1 that involve various combinations of schemata 
destruction and reconstruction. As far as pure schemata reconstruction is concerned 
however we see that 1-schemata play a privileged role as they represent the ultimate 
building blocks. For an iV-bit string the maximum number of time steps before all 
ancestors are 1-schemata is N — 1. 

All the above equally applies to a generic schema, £, composed of schemata, £l and 
£r which in their turn are composed of the schemata £,rl, !;lr an d etc. It should 
be clear that the idea of building blocks is manifest in the very structure of our evolution 
equations. Crl, £,lr and are building blocks for £ L and £ R which in their turn 
are building blocks for £. The ultimate building blocks are of course the 1-schemata. In 
the above example of a 4-bit string or schema the four building blocks of order one, i, j, 
k and / combine to form building blocks of order two ij and kl which in turn combine 
with the building blocks of order one to form building blocks of order three, ijk and jkl. 
The building blocks of order three combine with the blocks of order one and the blocks 
of order two combine together to give blocks of order four, and so the process continues. 

In terms of the effective fitness, / c ' ff (£, t) introduced previously 

P(U) = eX<«(^'p(£,0) + Pce ° SeS{ ] )dt E f P\UX)PXUt')e- <^ dt " dt' {32) 

N-l k=l Jo 

Up to now we have been able to analyze a general landscape. To arrive at more explicit, 
analytic formulae in an arbitrary landscape is prohibitively difficult. We will therefore 
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temporarily restrict our attention to some more restrictive but simpler cases. We start 
with the case of a flat fitness landscape. In this case s efi = — p c (l — 1)/(N — 1), hence 

P(£, t) = e-'^Pit, 0) + e-^m-S-^— £ f P'(e„ i')P'(£«, (33) 

Notice that dependence on the initial condition, P(£, 0), is exponentially damped unless 
£ happens to be a 1-schema, the solution of the 1-schemata equation being 

P(i,t) = P(i,0) (34) 

An immediate consequence is that when considering the source term describing recon- 
struction the only non-zero terms that need to be taken into account are those which 
arise from 1-schemata, as any higher order term will always have an accompanying ex- 
ponential damping factor. Thus we see that the fixed point distribution for a GA with 
crossover evolving in a flat fitness landscape is 

P*(£) = LU-^Pit, t) = J] P(i, 0) (35) 

i=l 

which is basically Geiringer's Theorem || in the context of schema distributions and 
simple crossover. We see here that the theorem appears in an extremely simple way as 
a consequence of the solution of the evolution equation. 

Note that this fixed point distribution arises purely from the effects of reconstruction, 
the absence of which leads to a pure exponential damping and the unphysical behavior 
P(£) — > 0. We can also see from the above that a version of Geiringer's theorem will 
also hold in a more general non-flat landscape where selection is only very weak, where 

what we mean by weak is that 41^ ~ (1 + e) and e < Pc (iV ( 7il) V/ > 1. Under such 

circumstances once again anything other than a 1-schema will be associated with an 
exponential damping factor. A distinction between the two cases however is that for 
a flat fitness landscape the fixed point is fixed by the initial proportions of the various 
1-schemata as there is no competition between them. Here, however, due to the non- 
trivial landscape certain 1-schemata are preferred over others. A concrete example of 
such a landscape would be fa = 1 + where J2i < j2+ej anc ^ ^ ^ s ^ ne fitness of 
the ith bit. Note there is no need to restrict to a linear fitness function here, arbitrary 
epistasis is allowed as long as it does not lead to large fitness deviations away from the 
mean. In this case < 1 + e. 

So what is the analog of the building block hypothesis here? Our schema theorem 
states that schemata of above average effective fitness will be allocated "exponentially" 
more trials over time. In the way the evolution equation is structured we see that the 
effective fitness in terms of the effects of crossover consists of a destruction term and a 
reconstruction term. Inherent in the structure of the reconstruction term is a form of 
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the building block hypothesis — that higher order schemata are built from fit, shorter, 
lower order schemata. If P'(£, t) > P'(t; L ,t)P'(£, R ,t) then the effects of destruction will 
outweigh those of reconstruction, whilst if P'(£, t) < P'(£, L , t)P'(£ R , t) reconstruction will 
dominate. The content of this inequality is that reconstruction will dominate destruction 
if the probability to select the parts of a schema is greater than the probability to select 
the whole schema. Once again this is a general conclusion valid for any landscape. 
To give a more analytic slant we restrict to the case of two-schemata in a flat fitness 
landscape wherein one finds 



Thus we see that the effect of reconstruction is greater than that of destruction if i and 
j are negatively correlated. Notice that if reconstruction is more important then the 
contribution from the latter is maximized by maximizing the schema length, /. In other 
words large, rather than small, schemata are favored! 

In general the fitness landscape itself induces correlations between £l and £r. In 
this case there is a competition between the (anti-) correlating effect of the landscape 
and the mixing effect of crossover. Selection itself more often than not induces an anti- 
correlation between fit schemata parts, rather than a positive correlation. Indeed, in the 
neutral case of a non-epistatic landscape one has 1 + ^-5 < (l + ^ L 5/^ L )(l-|-^ a 5/^) 
where 5 fa, 5fe L and 5f^ R are the fitness deviations of the schemata £, £l and £r from 
an average fitness which we have normalized to one half. Thus we see that selection 
induces an anti-correlation when Sf^ L , 5f^ R > and hence in an uncorrelated initial 
population, P'(£, t) < P'(£l> *)P'(£r, t). This means that crossover plays an important 
role in allowing both parts of a successful schema to appear in the same individual. The 
effect of crossover is to weaken but not cancel completely the anti-correlations induced 
by selection and thus make it easier to find the whole schema. Indeed, it is possible 
to show that for a non-epistatic landscape that the contribution to population fitness 
from all schemata of length I, starting with a random initial population at time t, is 
independent of I at time t + 1 and is an increasing function of I for large I at time t + 2 



More complicated landscapes one has to examine on a case by case basis. Of course, 
it is always possible to invent a landscape where there is a fitness advantage associated 
with bits that are close together. However, it is equally easy to find one where there is a 
fitness advantage for bits that are widely separated. The non-epsitatic landscapes above 
are neutral in this respect and therefore any results about the nature of schemata and 
building blocks are a reflection of the geometric effect of crossover and not associated with 
bit-bit correlations induced by the landscape itself. We now have ample experimental 
evidence that this is the case as well for "generic" landscapes with epistasis such as 
the Kaufmann Nk models. This evidence will be published elsewhere. A particularly 
interesting example of epistasis is deception as it has played an important role in the 
theory of GAs 0. The very nature of deception is such that the bits of a schema are 




(36) 



0- 
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less selected than the whole and hence we can see from the schema equation that in this 
circumstance destruction will outweigh reconstruction. However, this will only be totally 
deceptive if all possible schema reconstruction channels £,l + £,r —* £ are deceptive. For a 
schema of order N2 there are N2 — 1 such channels. Thus, for N2 large it will typically be 
quite unlikely that all channels will be deceptive. If there exist non-deceptive channels 
then it is probable that the population will evolve in those directions. In fact, as the 
example of a two-schema shows, for every deceptive channel there is a non-deceptive 
one. One may explicitly see this from 

p(n, t + 1) = p'(ii, t) - Pc (J^zjj ( p, ( n > *) p, (°°> *) - p/ ( 01 > *) p '( 10 > *)) 

P(01, t+l)= P'(01, t) - p c (Jf^jj (^(01, ^'(10, t) - P'(ll, t)P'(00, t)) (37) 

Here 11-channel deception, i.e. P'(ll, t)P'(00, t) > P'(01, t)P'(10, t), implies that the 
01-channel is non- deceptive. However, this is not much consolation if the 11-schemata 
happens to be the optimum. If we start with a random population then 11-channel 
deception is equivalent to the statement / off (ll) < /(ll). For something as simple as 
the two-schemata problem there is only a single 11-channel. For the 4- bit schemata 
ijkl we see that there are six reconstruction channels in total. There are various ways 
to end at a totally deceptive problem. For instance, the three channels ijk + I — > ijkl, 
ij + kl — > ijkl and i+jkl — > ijkl might all be deceptive. Alternatively all the 1-schemata 
— > 2-schemata channels might be deceptive. Generically the deviation of the effective 
fitness from the selective fitness will offer a reasonable measure of deception. 



6 Conclusions 

In this paper we have analyzed the Schema Theorem and the Building Block Hypothesis 
based on an exact evolution equation for GAs. At the level of the microscopic degrees 
of freedom, the strings, we established that the action of crossover by its very nature 
introduces the notion of a schema, the probability to reconstruct a given string being 
dependent on the probabilities for finding the right and left parts of the string relative 
to the crossover point in the two parents. These probabilities involve a coarse graining, 
i.e. an averaging over all strings that contain the constituent parts of the string, and 
hence represent schema probabilities. We saw that the same equation, after a suitable 
coarse graining, also described the evolution of any arbitrary schema. 

One might enquire as to what advantages a formulation based on schemata, as has 
been presented here, has over other existing formulations, such as the Vose Markov 
chain model. Indeed, the value of schemata and the Schema theorem in understanding 
GA evolution has been seriously questioned P, [20, ITJl O] There are many possible 



answers to this question: first a pragmatic one — that all "things" are made out of 
"building blocks" , whether they be tables, giraffes or computer programmes. Having an 
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exact, amenable description of complex systems from the microscopic point of view is 
a vain hope. Complex systems and complex behaviour can much better be understood 
in terms of EDOF. EDOF, almost by definition, are much fewer in number than the 
microscopic degrees of freedom and hence, in principle, would offer a computationally 
simpler picture. However, the number of ways of combining the microscopic degrees of 
freedom into EDOF is very large, hence one might think that such a description is even 
more costly than one based on the microscopic degrees of freedom such as the Vose model. 
This would be true if in analysing the GA one had to search through all the possible 
"coarse grainings" available. For a given landscape, however, a preferred coarse graining 
will often suggest itself. Secondly, we believe strongly that approximation schemes for 
solving GA evolution equations will be much more forthcoming via a formulation in 
terms of schemata wherein one may appeal to all the intuition and machinery of the 
renormalization group. 

We introduced the notion of effective fitness showing through explicit examples that 
it was a more relevant concept than pure selective fitness in governing the reproductive 
success of a schema. Based on this concept of effective fitness and our evolution equation 
we introduced a new schema theorem that showed that schemata of high effective fitness 
received an exponentially increasing number of trials as a function of time. We then went 
on to discuss the building block hypothesis. One of the more remarkable features of our 
equation is that it implicitly contains a version of the latter in that the structure of the 
reconstruction term relates in an ancestral tree the relation between a given schema and 
its more coarse grained ancestors as a function of time. This ancestral tree terminates 
at 1-schemata, which are in some sense the ultimate building blocks as they cannot be 
destroyed by crossover. We also showed that generically there is no preference for short, 
low-order schemata. In fact if schema reconstruction dominates the opposite is true, 
typically large schemata will be favored. Only in deceptive problems does it generally 
seem that short schemata will be favored, and then only in totally deceptive problems 
as the system will tend to seek out the non-deceptive channels if they exist. 

There are many points of departure from the present work to future research. On the 
theoretical side it will be very interesting to see if other exact results besides Geiringer's 
theorem follow very simply from our evolution equation. A fundamental issue is trying 
to find approximation schemes within which the equations can be solved, as for a general 
landscape an exact solution will be impossible. In this respect, as mentioned, techniques 
familiar from statistical mechanics such as the renormalization group might well prove 
very useful. In fact the very structure of our evolution equation is very similar to that of 
a renormalization group equation, a theme we shall return to in a future publication. It 
is of course necessary to verify the equations numerically. Some work in this direction has 
already been done |17| and further work has confirmed its qualitative conclusions [18]. In 



this respect one has to tread carefully, as the interplay between selection and crossover 
can be very subtle as the work on Royal Road functions [|J has shown. Although 
very simple we favor preliminary analytic analyses based on non-epistatic landscapes 
where one knows that there is no intrinsic inter-bit linkage due to the fitness landscape 
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and therefore one can study the geometric effects of crossover in a more uncluttered 
environment. 
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